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EVOLUTION 

Searching for new branches on the tree of life 

Is there undiscovered life that differs fundamentally from that in the three Imown domains? 



By Tanja Woyke and Edward M. Rubin 

Ever since Woese's seminal work nearly 
40 years ago (i), life has been divided 
into three domains: Archaea, Bacteria, 
and Eukaryota. But could there be life 
that does not fit into any of these do- 
mains? Novel techniques for exploring 
microbes that cannot readily be grown in 
the laboratory offer hope that scientists can 
discover such life, if it exists (see the first 
figure). These methods include metagenom- 
ics, which involves the sequencing of DNA 
extracted from environmental samples, and 
single-cell genomics, where individual micro- 
bial cells are isolated from the environment 
and their genomes amplified and sequenced. 

On the basis of these and other ap- 
proaches, we propose that microbial life be 
operationally divided into three categories: 
explored, unexplored, and undiscovered. It 
is among the latter that potential signs of ad- 
ditional branches on the tree of life beyond 
the three known domains may be found. 

EXPLORED, UNEXPLORED, UNDISCOV- 
ERED. The explored category encompasses 
microorganisms that can be cultivated in 
the laboratory. The unexplored category in- 
cludes uncultivated organisms present in 



environmental samples, whose existence is 
known only through their molecular signa- 
tures and occasionally from partial genome 
assemblies obtained through metagenom- 
ics (2) and single-cell genomics studies (3). 
The sequence of the 165 ribosomal RNA 
(rRNA) gene, which can be amplified from 
environmental samples with a set of "univer- 
sal" primers, has been extensively used as a 
molecular signature to assess the microbial 
diversity in a given sample, and to build phy- 
logenetic trees. Taxa from the unexplored 
category dwarf the explored in both numbers 
and diversity (4). Beyond these organisms for 
which we have ribosomal barcodes or other 
molecular signatures resides the as yet un- 
discovered life— the putative organisms that 
have thus far eluded our detection. 

Undiscovered life, if it exists, is either 
absent at the locations of existing environ- 
mental surveys or is missed by current ap- 
proaches. There are reasons to believe that 
current approaches may indeed miss taxa, 
particularly if they are very different from 
those that have so far been characterized. 
The "universal" primers used to detect 165 
rRNA genes from bacteria and archaea in 
environmental samples can miss major lin- 
eages because of primer mismatches (5). 
Similarly, the selection of specific single cells 



from environmental samples for genome 
sequencing has been based on rRNA gene 
identity, thus also reljdng on these universal 
primers. Organisms whose 165 rRNA genes 
are not recognized by the primers would not 
be detected using this approach. Past explo- 
rations of available metagenomic data sets 
have focused on the discovery of matches to 
the known genes and genomes— an analysis 
that is naturally biased against uncovering 
completely novel life. Finally, although we 
may soon have petabases of metagenomic 
sequence data, samples have been collected 
from only a minute fraction of Earth's count- 
less different environments. 

Recognizing these limitations, it is rea- 
sonable to speculate that undiscovered and 
highly divergent branches of life may ex- 
ist, possibly represented by domains whose 
marker genes differ extensively from those of 
the bacterial and archaeal branches on the 
tree of life. Refined strategies, involving both 
the application of new approaches and ac- 
cess to previously unexplored habitats, will 
be required for their discovery. 

THE SEARCH FOR UNDISCOVERED LIFE. 

Approaches to further explore the diversity 
of microbial life will need to include the ex- 
pansion and optimization of methods used 
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The search for new major branches on the tree of life. Cultivation-independent methods, novel sequencing technologies, and analytical approaches can be directed toward the 
detection of life outside currently established domains. 
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Parasitic remnants of a fourth domain? Electron microscopy images of (A) pandoravirus dulcis particle and (B) 
acanthamoeba polyphaga mimivirus particle. These viruses may be remnants of a fourth domain. Cellular entities from 
this domain either went extinct or remain to be found. 



to capture the genomes of unexplored (un- 
cultivated) organisms (see the first figure). 
Single-cell sequencing with microfluidic 
and cell-sorting approaches, focused spe- 
cifically on cells lacking amplifiable rRNA 
genes, is a high-throughput strategy to 
search for novel organisms. Massive-scale 
metagenomic sequencing of environmental 
DNA and RNA samples should, in principle, 
generate sequence data from any entity for 
which nucleic acids can be extracted. The 
analysis of such sequence data with novel 
computational methods focused specifically 
on discovering outliers to previously de- 
fined life is another powerful means to ex- 
plore the unknown. Assembled contiguous 
stretches of sequence data generated from 
environmental samples can be mined for 
unusual features in nucleotide composition, 
transfer RNA structures, and codon usage, as 
well as phylogenetic placement of rRNA and 
other marker genes (6). This would facilitate 
the detection of biological outliers and ge- 
nomic fragments with deep phylogenies. 

Finally, the application of single-molecule 
sequencing technologies that can recognize 
modified or nonstandard bases is another 
approach for the detection of life that may 
differ from the life we know. Sequencing to 
date has mostly been limited to the detection 
of the canonical four bases. Emerging new 
techniques, such as single-molecule real-time 
DNA sequencing (7) and nanopore-based 
single-molecule sequencing (8), have the po- 
tential to allow the recognition and charac- 
terization of environmental organisms with 
base modifications and compositions distinct 
from the four standard bases and their cur- 
rently described modifications. 

Beyond a new set of technologies and 
methodologies, the choice of suitable envi- 
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ronmental niches will be critical in catalog- 
ing the diversity of life. Although discoveries 
may be made by mining existing data sets 
from explored environments, future searches 
for deeply branching clades should include 
inhospitable and isolated environments. 
These sites would be expected to be preferred 
niches for early life, potentially sheltered 
from more modern microbial competitors. 
This would include hypoxic subsurface sites 
with environmental conditions predating the 
Great Oxidation Event that occurred about 
2.3 billion years ago. Support for the idea 
that isolated hypoxic environments may be 
preferred niches for early life comes from ob- 
servations that anaerobic niches deep within 
Earth's crust tend to harbor acetogens and 
methanogens; these species represent an- 
cient branches of autotrophs in the bacterial 
and archaeal domains (9). 

Using some of the aforementioned strat- 
egies, scientists are already speculating on 
the possible discovery of a "fourth domain." 
Phylogenetic analysis of marker genes from 
metagenomic data revealed the presence of 
deep, novel branches thought to be occupied 
by either a cellular entity or novel viruses 
(6). Extremely large and unusual DNA vi- 
ruses such as mimivirus and pandoravirus 
(see the second figure) (10) have also been 
shown to contain marker genes with deep 
phylogenetic roots between the archaeal and 
eukaryal domains (11). One interpretation of 
these findings is that there was an ancestral, 
cellular lineage that gave rise to these viruses 
but went extinct as a cellular entity, and that 
its genome now only is present as a so-called 
parasitic fourth domain (11). It is possible, 
however, that this cellular precursor has 
simply not yet been detected and still exists 
awaiting discovery. Even more speculative is 
the idea that an RNA world still exists in a 
niche of favorable conditions. A recent survey 
on the distribution of viruses among the do- 
mains of life found RNA viruses to be solely 



associated with bacteria and eukaryotes; the 
authors hypothesize that these viruses are 
remnants of an RNA world (12). Under the 
appropriate conditions, cellular entities with 
RNA genomes may still persist. 

A BRAVE NEW WORLD. While the search 
for new life is focused on organisms that 
exist in nature, a parallel effort is under 
way to create fundamentally new organisms 
in the laboratory. Organisms with an ex- 
panded genetic alphabet of six nucleotides 

(13) and with noncanonical codon usage 

(14) have already been built, represent- 
ing human-designed variations of existing 
branches. This quest of synthetic biologists 
to build radically novel organisms also of- 
fers possible models for unusual varieties of 
life that may be sought in nature. A recent 
computational analysis of a global metage- 
nomic data set revealed an unexpected 
abundance of environmental organisms 
with diverse noncanonical codon usage in 
nature (i5)— a feature that has been tar- 
geted for creation in the laboratory through 
genome engineering (14). 

An advanced toolkit of powerful genomic 
technologies is now poised to generate and 
mine increasingly large data sets for hints of 
life that differs strikingly from the life cata- 
loged thus far. The discovery of new building 
blocks and organisms from a new domain 
would likely have major implications for 
biotechnology, agriculture, human health, 
and synthetic biology efforts. It might also 
elucidate the early evolution of the domains 
and their divergence from the last universal 
common ancestor. Irrespective of the depth 
at which newly discovered branches may be 
anchored in the tree of life, the quest to find 
them will likely reveal unexpected and valu- 
able insights about the fruits of more than 3 
billion years of biological tinkering. ■ 
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